Download: make it retry more times and log the first error from many#1603
Download: make it retry more times and log the first error from many#1603yarikoptic merged 5 commits intomasterfrom
Conversation
Some users do experience odd timeouts from S3. Most likely due to some local effects, but I think we should strive more to ensure we try our best to download. So we will keep retrying more times.
I think we "over-optimized" minimization of messages in some prior PR and it is hard or impossible to find actual error or even for which file/zarr. So I decided to add an explicit error log line to ease debugging
There was a problem hiding this comment.
Pull Request Overview
This PR improves the download functionality by logging the first encountered error for easier debugging and increasing the number of retry attempts to enhance robustness.
- Logs the first error message before raising a RuntimeError.
- Increases retry attempts from 3 to 10 in the download function.
Comments suppressed due to low confidence (2)
dandi/download.py:217
- [nitpick] Consider whether logging only the first error provides sufficient context for debugging; if additional context could be useful, logging more or all errors might be helpful.
error_msg = f"Encountered {pluralize(len(errors), 'error')} while downloading."
dandi/download.py:747
- [nitpick] Consider extracting the magic number '10' into a well-named constant (e.g., MAX_DOWNLOAD_ATTEMPTS) to improve code clarity and maintainability.
10 # number to do, could be incremented if we downloaded a little
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1603 +/- ##
==========================================
+ Coverage 88.54% 88.64% +0.10%
==========================================
Files 78 78
Lines 10872 10872
==========================================
+ Hits 9627 9638 +11
+ Misses 1245 1234 -11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
candleindark
left a comment
There was a problem hiding this comment.
Let comment not related to the changes you put in but about some existing code for downloading.
For its new return type which doesn't have the `None` value
|
I updated the test. This PR is good to go. I still see the following connection failure in one of the tests. connection failure================================== FAILURES ===================================
____________________________ test_follow_redirect _____________________________
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:787: in urlopen
response = self._make_request(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:488: in _make_request
raise new_e
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:464: in _make_request
self._validate_conn(conn)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:1093: in _validate_conn
conn.connect()
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connection.py:741: in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connection.py:920: in _ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\ssl_.py:460: in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\ssl_.py:504: in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:513: in wrap_socket
return self.sslsocket_class._create(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:1071: in _create
self.do_handshake()
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:1342: in do_handshake
self._sslobj.do_handshake()
E ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
During handling of the above exception, another exception occurred:
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\adapters.py:667: in send
resp = conn.urlopen(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:841: in urlopen
retries = retries.increment(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\retry.py:474: in increment
raise reraise(type(error), error, _stacktrace)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\util.py:38: in reraise
raise value.with_traceback(tb)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:787: in urlopen
response = self._make_request(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:488: in _make_request
raise new_e
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:464: in _make_request
self._validate_conn(conn)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connectionpool.py:1093: in _validate_conn
conn.connect()
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connection.py:741: in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\connection.py:920: in _ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\ssl_.py:460: in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\urllib3\util\ssl_.py:504: in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:513: in wrap_socket
return self.sslsocket_class._create(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:1071: in _create
self.do_handshake()
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\ssl.py:1342: in do_handshake
self._sslobj.do_handshake()
E urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
During handling of the above exception, another exception occurred:
dandi\tests\test_dandiarchive.py:461: in test_follow_redirect
url = follow_redirect("https://bit.ly/dandi12")
dandi\dandiarchive.py:894: in follow_redirect
r = requests.head(url, allow_redirects=True)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\api.py:100: in head
return request("head", url, **kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\api.py:59: in request
return session.request(method=method, url=url, **kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:589: in request
resp = self.send(prep, **send_kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:724: in send
history = [resp for resp in gen]
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:724: in <listcomp>
history = [resp for resp in gen]
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:265: in resolve_redirects
resp = self.send(
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\sessions.py:703: in send
r = adapter.send(request, **kwargs)
C:\hostedtoolcache\windows\Python\3.10.11\x64\lib\site-packages\requests\adapters.py:682: in send
raise ConnectionError(err, request=request)
E requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError([1005](https://github.com/dandi/dandi-cli/actions/runs/14249292407/job/39937930698?pr=1603#step:8:1006)4, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
=============================== tests coverage ================================
______________ coverage: platform win32, python 3.10.11-final-0 _______________
Coverage XML written to file coverage.xml
============================ slowest 10 durations =============================
75.12s call dandi/tests/test_dandiarchive.py::test_follow_redirect
71.61s setup dandi/cli/tests/test_service_scripts.py::test_reextract_metadata
19.13s call dandi/tests/test_metadata.py::test_bids_nwb_metadata_integration
8.48s call dandi/tests/test_organize.py::test_organize_nwb_test_data[1-dry]
3.90s call dandi/tests/test_organize.py::test_organize_nwb_test_data[1-symlink]
3.90s call dandi/tests/test_organize.py::test_organize_nwb_test_data[1-hardlink]
3.90s call dandi/tests/test_organize.py::test_organize_nwb_test_data[1-copy]
3.89s call dandi/tests/test_organize.py::test_organize_nwb_test_data[1-auto]
3.88s call dandi/tests/test_organize.py::test_organize_nwb_test_data[1-simulate]
3.27s setup dandi/cli/tests/test_cmd_validate.py::test_validate_nwb_path_grouping
=========================== short test summary info ===========================
FAILED dandi/tests/test_dandiarchive.py::test_follow_redirect - requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
= 1 failed, 451 passed, 282 skipped, 1 xfailed, 9 xpassed in 284.33s (0:04:44) =The failure is coming from dandi\tests\test_dandiarchive.py:461: in test_follow_redirect
url = follow_redirect("https://bit.ly/dandi12")
dandi\dandiarchive.py:894: in follow_redirect
r = requests.head(url, allow_redirects=True)and it is unrelated to this PR. |
|
thanks for fixing up tests @candleindark ok, we might be done with it, unless someone would want to add a unittest to actually add coverage for failing to download attempts testing -- apparently that is not covered at all :-/ but I have no spare cycles atm, so may be later. final blessing @candleindark ? |
|
windows fail - filed |
|
oh it was already approved -- merging ;) |
|
🚀 PR was released in |
See individual commits for more information.